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Improved Multiple Block Adder Using Carry Increment Adder 

[0001] The present application claims priority from U.S. Provisional Application 

serial no. 60/269,450 filed December 22,2000, entitled M A Low Power and High 
Performance Multiply Accumulate (MAC) Module" of Kaoru Awaka et al. This 
disclosure is incorporated herein by reference. 
Field of Invention: 

[0002] This invention relates to an improved adder architecture in which both a 

carry increment adder is used with a carry lookahead adder. 

Background of Invention: 

[0003] A conventional N-bit comprises adder building blocks. A common adder 

building block is a full adder that takes as inputs bit A, bit B and carry-in bit Cin and 
produces sum S and carry-out Cout as illustrated in Figure L A cascade of N full adders 
can be used to provide an N-bit ripple carry adder as illustrated in Figure 2. Figure 2 
illustrates three adders adding three bits at input A (bits 0-2) to three bits at input B (bits 
0-2) to get sum bits S0-S2) and carry (Cout). A ripple carry adder is one that the output 
sum gets updated from lower bits. The higher bit waits for the carry propagation from 
the lower bit adder. A ripple carry adder is too slow for most long adders since an n bit 
ripple carry takes N full delays. 

[0004] The delay can be reduced by carry lookahead adder (CLA) that computes 

the carry through several bits using one complicated gate instead of a cascade of several 
full adders. An example of a 16-bit lookahead adder is illustrated in Fig. 3. It has four 4- 
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bit blocks 11-14 and the lookahead circuits 15-17 to quickly send the carry to the most 
significant bits at the ripple carry adder block 1 1 for summing bits 12-15. Each of the 
blocks 11-14 includes four ripple carry full adders to sum four bits as illustrated with 
three bits in Figure 2. 

[0005] A high speed adder can be provided using carry select adders (CSA). A 

16-bit carry select adder (CSA) adder system comprises three 4-bit CSA adder blocks 22- 
24 and a 4-bit ripple carry adder block 21 is illustrated in Figure 4. The ripple carry block 
adder 21 adds the four least significant bits [3:0]. The most significant bit CSA adder 
block 24 adds the most significant bits [15:12], the next lower level bit CSA adder block 
23 adds bits 8-1 1 ([1 1 :8]) and the lowest CSA adder block 22 adds bits 4-7 ([7:4]). The 
block separation might be 4-4-4-4 as shown but may also be 5-4-4-3 or other. This 
depends on circuit optimization, input signal delays, etc. Each of the CSA adder blocks 
22-24 comprises two ripple carry adders 25 and 26 to pre-compute carry- in "0" and "1" 
case. When carry-in is reached, the sum S output is "0" case or "1" case. The two short 
adders 25 and 26 at each block of four bits speculatively calculate the sum assuming the 
carry-in of a "0" or "1", and the actual carry in to the trigger a multi-plexor (MUX) 27 
selects the appropriate sum. 

[0006] The CSA is one of the fastest adder architectures that realize high 

performance MAC unit, but it cannot generate carry signal as fast as CLA can. Since one 
of the most critical paths of the adder block is related to the generation of carry signal to 
the most significant bit (MSB), CLA circuit is used to generate carry signals sent to 
MSB. 
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[0007] A higher speed adder is a carry select adder (CSA) with carrier lookahead 

adder (CLA) circuits is illustrated in Figure 5. The example in Figure 5 is a 1 6-bit adder 
with a carry lookahead adder (CLA) circuit 28 between each 4-bit CSA adder 29 and 
between the ripple adder 29a and the CSA 29b with the CLA circuit used to generate 
carry signals to the MSB. 

[0008] It is highly desirable to make this path faster without degradation of 

generation speed of the sum and thereby increase adder speed. 

Summary of Invention: 

[0009] In accordance with one embodiment of the present invention an adder 

architecture is provided in which both carry lookahead and carry increment adders are 
used. 

[00 1 0] In accordance with another embodiment of the present invention a long 

adder is provided by the combination of carry select adder and carry increment adders. 

Description of Drawing: 

In the drawing: 

[001 1] Figure 1 illustrates a full adder according to the prior art. 

[0012] Figure 2 illustrates a ripple carry adder according to the prior art. 

[0013] Figure 3 illustrates a carry lookahead adder (CLA) according to the prior art. 

[0014] Figure 4 illustrates a 16-bit adder with CSA adders according to prior art. 

[0015] Figure 5 illustrates a higher speed 16-bit adder with CLA and CSA adders 

according to the prior art. 
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[0016] Figure 6 illustrates a long adder adder with CLA and CSA adders. 
[0017] Figure 7 illustrates a carry select adder (CSA). 
[0018] Figure 8 illustrates a carry increment adder (CIA). 

[0019] Figure 9 illustrates an adder configuration according to one embodiment of the 
present invention using carry select adder CLA, carry increment adder CIA and carry 
lookahead adder CLA. 

[0020] Figure 10 illustrates an adder with a CSA for the MSB and CIA adder for other 
significant bits and CLA carry for bits [14:0]. 

[0021] Figure 1 1 illustrates an adder with a CSA adder for the MSB and CIA adder for 
other less significant bits and a CLA carry for both bits [14:0] and [30:15]. 
[0022] Figure 12 illustrates the 41 bit adder with the combination of CSA adder and CIA 
adders without a carry lookahead adder. 

[0023] Figure 13 illustrates a more general high-speed adder using carry select adder 
CLA ? carry increment adder CIA and carry lookahead adder CLA. 

Description of Preferred Embodiment of the Present Invention: 

[0024] A first preferred embodiment of the present invention is described in 

connection with a configuration illustrated in Figure 6 that is used in a special high speed 
long adder used in a multiply accumulate module (MAC). This special adder is described 
in Figure 7 of the above cited application serial no. 60/269,450 filed December 22, 2000 
entitled "A Low Power and High Performance Multiply Accumulate (MAC) Module" of 
Kaoru Awaka et al. In this particular application the multiply output is adder input. Then 
the adder input signal will reach adder in different time. The time difference enables the 
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use of low power multiply cell as described in Application serial no. of TI-33253, 
incorporated herein by reference.The configuration is a 41 bit + 41 bit adder [bits 40:0] 
for example. A higher speed adder for this application includes carry select adders (CSA) 
and carry lookahead adders (CLA) as illustrated for example in Fig. 6. The high speed, 
long adder system 30 of Figure 6 comprises a first CSA [14:0] adder 31 for summing the 
fifteen least significant bits [14:0], a next higher level middle CSA [30:15] adder 32 
combined with a carry lookahead adder (CLA) 35, and a highest level CSA[40:31] adder 
33. The CSA[14:0] adder 31 comprises sub-blocks adders and is similar to the combined 
blocks illustrated in Figure 4 with a ripple carry sub-block for the four least significant 
bits or CSA[3:0], a middle CSA sub-block block for the next six higher bits or CSA[9:4], 
and a highest CSA adder sub-block for the highest five bits or [14:10]. This CSA[14:0] 
block can be represented as comprising : ripple[3:0]— »CSA[9:4]— >CSA[14:10]. The sub- 
block separations may vary and are dependent on circuit optimization, input signal delay, 
etc. The CSA [30:15] block adder 32 receives the carry (carry 14) from the CSA [14:0] 
and has three sub-blocks, for example, with a block separation of 
CSA[18:15]^CSA[24:19]->CSA[30:25], While one carry lookahead adder CLA block 
35 is illustrated there may be multiple lookahead circuits such as one for every block 
separation. The CSA [40:31] is a single CSA 10-bit adder block that receives the carry 30 
from carry lookahead adder CLA 35. Each of the CSA adders comprises two ripple carry 
adders to pre-compute carry-in "0"case and "1" case and the multiplexor. When carry-in 
is reached, "S" output is selected "0" case or "1" case. 

[0025] Referring to Figure 7 there is illustrated the detail of a carry select adder. 

The carry select adder (CSA) 40 has two ripple carry or carry-propagate adders (CPAs) 
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41 and 43 , which speculatively calculate the sum assuming the carry-in equals 0 or 1, and 
actual carry-in C k? can trigger a MUX 45 , which selects the appropriate sum. These two 
CPAs 41 and 43 increase load capacitance on signal lines. There are two ripple carry 
adders for each CSA. 

[0026] Referring to Figure 8 there is illustrated a carry increment adder (CIA) 50. 

The CIA 50 has only one ripple carry or carry propagate adder CPA 5 1 . In CIA 50 only 
the result with carry-in 0 is pre-computed and incremented by 1 afterwards if Ck equals 1. 
Therefore, comparing to carry select adder (CSA), CIA 50 can make carry signals faster 
because of drastic reduction in load capacitance. CIA is better than CSA with these 
points: 1 .Fewer number of ripple carry adder (same meaning of CPA-Carry Propagate 
adder). 2. Fewer transistors. 3. Less load capacitance. 4. Possible to make faster carry 
generation. 

[0027] The new adder architecture according to the present invention for the 

configuration like that illustrated in Figure 6 is illustrated in Figure 9. CSA [30:15] adder 
32 and CSA [14:0] adder 31 are replaced with a carry increment adders CIA [30:15] 
adder 38 and CIA [14:0] adder 39. The CIA [14:0] and CIA[30:15] may have the same 
sub-block separation as discussed in connection with Figure 6 or different separation. For 
the same separation it would be for CIA [14:0] the following: ripple [3:0] -—►CIA [9:4]— » 
CIA [14:0]. For CIA [30:15] the following sub-blocks CIA [18:15] ^CIA[24-19] 
^CIA[30:25] may be used. CIA is used in place of CSA for the middle blocks as an 
improvement to the adder of Fig. 6. CSA is used for the block 39 that includes the MSB 
bit or signed bits. This is used to define overflow or underflow. In CSA case, we can pre- 
calculate overflow/underflow in both propagate carry "0" case and "1" case, and can 
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select with carry. But CIA case, difficult to pre-calculate overflow/underflow due to "S" 
out will not be defined before propagate carry arrive. So we need to keep the MSB bits 
to remain CSA [40:31] . By using the CSA at the most significant bit block 39, overflow 
signals can be generated faster. As discussed previously, the circuit performance of the 
adder is determined by the carry signals from the middle block. In the middle block 38 
the CIA is used where CSA was used in the prior art of Figure 6 with carry look-ahead 
(CLA) adders 45 to make the carry signals faster. This change from CSA to CIA reduces 
load capacitance and results in both power and delay reduction. 

[0028] CSA can make faster sum signals, Si ; k ? as compared with CIA 

because there is no need to take sum when Ck is 1 . Since fast sum signals are required to 
generate overflow detect signal, the most significant bits (MSB) block to use CSAs. 

[0029] The present invention makes the carry signals faster without 

degrading the delay for other signals. The improvement in circuit performance obtained 
by the present invention is summarized by the multiply-accumulate (MAC) module level 
delay and power. Conventional prior art structure delay is 3.995 nanoseconds and uses 
0.2174 milli-watts. The new structure delay is 3.687 nanoseconds and uses 0.1604 
milliwatts. This amounts to a difference of 308 picoseconds and 0.057 milliwatts. The 
condition for the speed simulation is 1 .35V, 125C, weak corner. For power simulation, 
nominal transistors were used. Power was measured at 100 MHz. In addition to speed 
improvement of 8 %, power was also reduced by 26%, due to drastic reduction of load 
capacitance. 
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[0030] Since CIA requires less numbers of carry propagate adder (CPA) n 

numbers of transistors can be reduced. Thus, load capacitance becomes smaller and it 
results in lower power and high performance. 

[003 1 ] As stated previously the configuration of Figure 9 (which is a modification 

of Figuure 6) is only by way of example. Other configurations for the same number of 
bits may be like that illustrated in Figures 10-12 for example. Figure 10 illustrates a case 
where the input will be reached at the same time and the carry lookahead circuit is for the 
bits [14:0]. In the Figure 1 1 example there is a carry lookahead circuit for bits [14:0] and 
for bits [30:15]. In the configuration of Figure 12 there is no carry lookahead circuit. It is 
the combination of the CIA for bits [30:0] and CSA for bits 40:31]. 
[0032] Figure 13 illustrates a more general high-speed adder using CIA, CLA and 

CSA adders. It is assumed d>c>b>a>0. It is assigned that [d: c+1] will be signed bits, 
then as discussed above, to detect overflow/underflow faster, CSA [d: c+1] ( represented 
by adder 91) will be good choice. This is used for the MSB block 91 . In the middle 
blocks [c: b +1] or block 92, [b: a+1] or block 93 and [a:0] or block 94 are carry 
increment adders (CIA) and these are used with the carry lookahead CLA circuits 95. 
There can be multiple carry lookahead circuit for each CIA block. Also CIA can make 
hierarchical approach for speed-up. 

[0033] The number of bits and the block separations depends on the circuit 

optimizations,input signal delays, etc. Various modifications of the embodiments of the 
present invention will be apparent to those skilled in the revelant art and various 
modification, additions, substitutions and the like can be made without departing from the 



DC01:302727.1 



8 



Attorney Docket 032350.B344 
TI-33252 

spirit of the invention and there are therefore considered to be within the scope of the 
invention as defined in the following claims. 
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